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Abstract 

In biology, information flows from the environment to the genome by the process 
of natural selection. But it has not been clear precisely what sort of information 
metric properly describes natural selection. Here, I show that Fisher information 
arises as the intrinsic metric of natural selection and evolutionary dynamics. Max- 
imizing the amount of Fisher information about the environment captured by the 
population leads to Fisher's fundamental theorem of natural selection, the most 
profound statement about how natural selection influences evolutionary dynam- 
ics. I also show a relation between Fisher information and Shannon information 
(entropy) that may help to unify the correspondence between information and dy- 
namics. Finally, I discuss possible connections between the fundamental role of 
Fisher information in statistics, biology, and other fields of science. 
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Despite the pervading importance of selection in science and life, 
there has been no abstraction and generalization from genetical selection 
to obtain a general selection theory and general selection mathematics 
. . . Thus one might say that "selection theory" is a theory waiting to 
be born — much as communication theory was 50 years ago. Probably 
the main lack that has been holding back any development of a gen- 
eral selection theory is lack of a clear concept of the general nature or 
meaning of "selection". . . 

Probably the single most important prerequisite for Shannon's fa- 
mous 1948 paper on "A Mathematical Theory of Communication" was 
the definition of "information" given by Hartley in 1928, for it was 
impossible to have a successful mathematical theory of communication 
without having a clear concept of the commodity "information" that a 
communication system deals with. (Price, 1995) 

Introduction 

Brillouin (1956) distinguished two types of information. First, a natural phe- 
nomenon contains an intrinsic amount of information. Second, observation trans- 
fers information about the phenomenon to the data. Observations often do not 
completely capture all information in the phenomenon. Wheeler (1992) and Frieden 
(2004) suggested that the particular form of dynamics in observed systems arises 
from the flow of information from phenomena to observation. 

These concepts of information flow seem similar to the process of evolutionary 
change by natural selection. In biology, a population "measures" the intrinsic in- 
formation in the environment by differential reproduction of individuals with vary- 
ing phenotypes. This fluctuation of phenotype frequencies transfers information 
to the population through changes in the frequencies of the hereditary particles. 
However, the population does not fully capture all of the intrinsic information 
in the frequency fluctuations caused by differential reproduction, because only a 
fraction of phenotypic information flows to the next generation via changes in the 
frequencies of the hereditary particles. 
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The analogy of natural selection as measurement and the analogy of heredity as 
the partial capture of the information in differential reproduction seem reasonable. 
But how close can we come to a description of these processes in terms of a 
formal measure of information? To study this question, I developed Frieden's 
(2004) conjecture that Fisher information is the metric by which to understand 
the relation between measurement and dynamics. Fisher information provides the 
foundation for much of the classical theory of statistical inference (Fisher, 1922). 
But the role of Fisher information in the conjectures of Brillouin, Wheeler, and 
Frieden remains an open problem. 

I show that maximizing the capture of Fisher information by the hereditary 
particles gives rise directly to Fisher's (1958) fundamental theorem of natural se- 
lection, the most profound statement of evolutionary dynamics. I also extend 
Fisher information to account for observations correlated with the transfer of en- 
vironmental information to the population, leading to a more general form that 
is equivalent to the Price equation for the evolutionary change in an arbitrarily 
defined character (Price, 1970, 1972a). My analyses show that the Price equation 
and general aspects of evolutionary dynamics have a natural informational metric 
that derives from Fisher information. 

Although I show a formal match between Fisher information, the fundamental 
theorem of natural selection, and other key aspects of evolutionary analysis, I 
do not resolve several issues of interpretation. In particular, it remains unclear 
whether the link between Fisher information and the evolutionary dynamics of 
natural selection is just an interesting analogy or represents a deeper insight into 
the structure of measurement, information, and dynamics. My demonstration 
that the particular form taken by evolutionary dynamics arises naturally from 
Fisher information suggests something deeper, but the problem remains an open 
challenge. 

Overview 

In the study of evolutionary dynamics, one must relate various quantities that 
change and the processes that cause change. Consider, for example, the frequencies 
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of hereditary units, such as genotypes, genes, vertically transmitted pathogens, and 
maternal effects. The frequencies of such hereditary units may be influenced by 
selection, transmission, and mixing during recombination and sexual reproduction. 
It is possible to express the relations between these quantities and processes in a 
wide variety of seemingly distinct ways, and with diverse notations. However, a 
single formalism exists beneath all of those expressions. That underlying formalism 
and several alternative expressions can be regarded as already reasonably well 
known. 

Why, then, should one look for new alternative expressions? In my opinion, 
it is useful to understand more deeply the underlying formalism and to study the 
connections between evolutionary dynamics and concepts that have been developed 
in other fields of study. The value of such understanding, based on analysis of 
alternative expressions, necessarily has a strongly subjective component. There 
can never be a definitive argument against the claim that alternative expressions 
simply reformulate dynamics by notational change. In defense of developing this 
work, one can reasonably say that natural selection is among the most profound 
processes in the natural world, and any further insight that can potentially be 
obtained about natural selection is certainly worth the effort. 

To start, let us place natural selection in the context of evolutionary change. 
The complexity of evolutionary dynamics arises in part from the three distinct lev- 
els of change. First, characteristics, or phenotypes, determine how natural selec- 
tion influences relative success, and the frequency distribution of character changes 
over time partly in response to differences in success associated with characters. 
Second, individuals carry and transmit hereditary particles that influence their 
characters. We usually associate hereditary particles with genes, but any trans- 
mitted entity could influence characters and could be tracked over time. Third, 
each individual carries a particular array of hereditary particles, the array usually 
called the genotype. 

The relations between the genotype and the hereditary particles create one 
of the complexities of evolutionary dynamics. In many organisms, individuals 
transmit a only subset of their hereditary particles, and combine their contribution 
of particles with another individual to make offspring with a new genotype. Thus, 
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differences in success do not directly change the frequency of genotypes. Rather, 
variation in genotypic success changes the frequency of the hereditary particles 
that are transmitted to make new combinations of genotypes in offspring. 

Further complexities arise because particles spontaneously change state (mu- 
tation), the effect of each particle varies with the combination of other particles 
with which it lives in genotypes, the environment changes through the changes 
in evolving organismal characters, and the environment changes for other reasons 
outside of the evolutionary system. Because a vast number of outside processes 
potentially influence a particular evolutionary system, no complete description of 
evolutionary change can be developed from first principles. Even within the con- 
fines of a closed system of evolutionary genetics, no description can both capture 
simple generalities and the complexities of each situation. 

In this paper, I focus on partial analyses of dynamics that highlight simple 
generalities common to all evolutionary systems. I emphasize the role of natural 
selection in evolutionary change, because selection is perhaps the most important 
component of evolutionary change, and because one can draw some very clear 
conclusions about the role of selection in dynamics. Other factors are often impor- 
tant, but do not lend themselves to simple and general insights about evolutionary 
dynamics. 

I start with the formal mathematical connections between Fisher information 
and the properties of natural selection, without much comment on deeper aspects 
of interpretation. I then return to the problems of interpretation and alternative 
measures of information. 

Fisher information 

Fisher information measures how much information observations provide about an 
unknown parameter of a probability distribution. Suppose p{y\0) is a probability 
distribution function (pdf) of a random variable y with parameter 6. Define the 
log-likelihood of the parameter 9 given an observation of the random variable, y, 
as L{6\y) — \og\p{y\6)], where I use log(-) to denote the natural logarithm. Then 
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a simple univariate form of the Fisher information about the parameter 6 is 

This equation presents the standard form most commonly found in the literature. 
I now present some variant forms to explain what Fisher information actually 
measures and to help in the study of evolutionary dynamics. 
We can write an equivalent definition of Fisher information as 

which measures the expected curvature, or acceleration, of the log-likelihood func- 
tion with respect to the parameter 6 (see, for example, Arami & Nagaoka, 2000). 
A relatively flat log-likelihood surface means that an observation provides rela- 
tively little information about 6, whereas a strongly curved surface means that an 
observation provides relatively more information. 

With regard to evolutionary dynamics, a more useful form emphasizes that 
Fisher information measures the distance between two probability distributions 
for a given change in the parameter 6. To understand this distance interpretation, 
let us simplify the notation by defining py = p{y\0) and p'y = p{y\9 + d9). Let us 
also work with a discrete probability distribution, such that y = 1, . . . ,D. Then, 
an alternative form of Eq. ([T]) can be written as 

Think of the probability distribution as a point in logarithmically scaled /^-dimensional 
space, each dimension weighted by the square root of the frequency of that dimen- 
sion in the initial distribution. For the initial distribution at 6, the corresponding 
point in D space is given by the vector v = {a/p^ log(py)} for y = 
for the shifted distribution at = + d9, the corresponding point is given by 
^' ~ {y/Py log(p^)}. Then F measures the square of the euclidean distance be- 
tween these two vectors divided by d^^^. In other words, F measures, on a log- 
arithmic scale weighted by the initial frequencies, the squared euclidean distance 
between the distributions relative to the scale change defined by d9. 
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A slight Rotational change emphasizes this interpretation of Fisher information 
as a distance between two distributions. Write the slope of the log-likelihood 
function with respect to 9 as Ly = d\og{py) / d9 = \og{py) = Py/Py, where the 
overdot means differentiation with respect to 6. Then 

^ = T.Py(^)' = J2Py^l = T.Py^og{py), (3) 

y \ yy y y 

which emphasizes that Fisher information measures a squared distance between 
distributions when each dimension is scaled to give the fractional change in the 
distribution — logarithms are simply a scaling to measure fractional change. This 
equation also gives us the first hint that Fisher information may be related to 
evolutionary dynamics: the Py may be interpreted as the evolutionary change in 
frequency of the yth. type in the population, where we may choose to label types 
by phenotype, genotype, or any other classification. 

The rightmost form in Eq. ([3]), ^Pylog(p?/); suggests that Fisher information 
is related to the Shannon information measure, or entropy, — 'YliPy^'^ziVy)- Later, 
I will show an equivalence relation between Fisher information and Shannon in- 
formation with regard to the study of dynamics. 



Dynamics of type frequencies: selection 

I now relate Fisher information to one part of evolutionary dynamics: the change 
in the frequencies of types caused by variation in success. I use the word fitness 
to denote a measure of success. 

In evolutionary dynamics, the fitness of a type defines the frequency of that 
type after evolutionary change. Thus, we write p'y = py{wy/w), where Wy is the 
fitness of the yth type, and w = Vy'^y is the average fitness. Recall that we may 
use y to classify by any kind of type. 

It is very important to note the particular definition of fitness used in what 
follows. Here, Wy is proportional to the fraction of the second population that 
derives from (maps to) type y in the first population. Thus, p'y does not mean the 
fraction of the population at 6' of type y, but rather the fraction of the population 
at 6' that derives from type y aX 6 (Frank, 1995, 1997; Price, 1995). 
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The fitness measures, w, can be thought of in terms of the number of progeny 
derived from each type. In particular, let the number of individuals of type y at 
6 he Ny = Npy, where is the total size of the population. Similarly, at 6', let 

= N'p'y. Then w = N'/N, and Wy = N'y/Ny. 

Fitness can alternatively be measured by the rate of change in numbers, some- 
times called the Malthusian rate of increase, m. To obtain the particular rates 
of change to analyze Fisher information, we see from Eq. ([3]) that we need an 
expression for 

— = log(p^) 

Py 

= \og{Ny/N) 

= \og{Ny) - log(iV) 

= Ny/Ny - N/N 

= rriy — m 

= (4) 

where ay is called the average excess in fitness (Fisher 1958; Crow & Kimura, 
1970). 

Substituting into Eq. ([3]) yields 

y V-^y/ y 

where P denotes the total Fisher information in the population about 9 obtained 
from the frequency fluctuations Py. Note that P is also a normalized form of the 
total variance in fitness in the population, because is the squared fitness devia- 
tion of each type over the scale d^^. The value of P also denotes the square of a 
standardized euclidean distance between the initial population and the population 
after evolutionary change, as discussed in the section above on Fisher information. 

Frieden et al. (2001) noted that total evolutionary change, P, is a measure of 
Fisher information. In particular, they wrote the total Fisher information as F = 
Y^(ly{%/ %Y , where q = dq/dt is the time derivative of type frequencies. Given 
that formulation, the Fisher information in the fluctuations of type frequencies 
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provides information about time rather than about the environment. However, 
Frieden et al. failed to advance beyond this stage of analysis because of their 
nonintuitive conclusion that information about time is the focal metric, and their 
lack of analysis with regard to the transmission of information via the hereditary 
particles. To move ahead, I first show that time is not the correct metric about 
which information should be interpreted. I then analyze the flow of information 
to the hereditary particles. 

The scale of change 

Let us review Fisher information. We begin with a probability distribution, p, 
that depends on some parameter, 6. We may interpret the probability distribution 
as the frequency of various types in a population, for example, the frequency of 
genotypes or phenotypes. The amount of Fisher information about 6 provided by 
an observation depends on how much the probability distribution, p, changes as 9 
changes. 

The total Fisher information about 6 is equal to a measure of the squared 
distance, dlog(p)^, that the probability distribution moves with respect to the 
scale of dynamics set by the underlying parameter, d6'^. From the definition given 
in Eq. ([2]), Fisher information measures the observed acceleration of frequencies, 
and equates the observed acceleration to information about the unobserved force, 

e. 

The typical view of evolutionary dynamics would be to consider ^ as a measure 
of time, and to follow the dynamics with respect to changes in time. But a simple 
example suggests that using time alone as the scale of measure is not sufficient. 

Consider a standard dynamical equation for the change in frequency with re- 
spect to time, which takes the form 

dt 

Here, dt is the time scale over which change occurs, and Sy is the rate at which 
change occurs for the yth type, subject to the constraint that the total change 
in frequencies is zero, '^dpy = '^SyPydt = 0. The units on dt are measured in 
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time, t, and the units on the rate parameters, Sy, are measured in units 1/t. Note 
that each Sy may depend on the frequencies of other types, thus this expression is 
a short-hand for what may be complex nonhnear dynamics and is only meant to 
emphasize the dimensions of change rather than the details of the dynamics. 
Now make the substitution Sy = say, yielding 

dpy _ 

— SPytty. 

The dimensions on s are 1/t, thus sdt — d9 is the dimensionless scale over which 
change is measured. Because Py is a frequency, which is dimensionless, Uy must also 
be dimensionless. Thus, all dynamical expressions scaled over time have natural 
nondimensional analogs of the form 

sdt ~ de 

Similarly, we could measure change over space by dL, with dimensions L for 
some length scale, and the rate of change with length by f], with units 1/L, so 
that f3dL would be the dimensionless scale over which one measures dynamics. 

These lines of evidence suggest that a proper interpretation of the dimensionless 
scale of change, d^, would, for example, be d^ = sdt or d9 — (3dL. We may use 
other dimensionless scales, but in each case, the dimensionless quantity would be 
a change per unit scale multiplied by a length scale. 

Given the simple path from standard dynamics to the dimensionless interpre- 
tation of 6, why did it take me so long to arrive at this conclusion? Because my 
point of departure was Fisher information, and I derived from Fisher information 
a view of dynamics as a distance between an initial population and a population 
changed over some arbitrary change in parameter, 9. That path from Fisher in- 
formation left the meaning of 6 unspecified, and so we had to connect a distance 
view of dynamics to a standard view based in time, and find ^ as a dimensionless 
scale of change. 

The path from Fisher information to dynamics raises another issue of interpre- 
tation. In typical dynamical analyses, we start with given initial conditions and 
rates of change, and then calculate the changed population. An analysis of evolu- 
tionary dynamics that begins with Fisher information follows a different path. We 
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start with observations of an initial population and a changed population, and use 
the observed distance between those populations to gain information about the 
unobserved environmental forces that determine the scale of change, 9. Thus, in 

dPy =P'y-Py= PydydO, 

the value of Uy — dpy/ (pydO) — Py/py measures the observed frequency perturba- 
tion over the unobserved scale of change 9. We use the observed set of frequency 
perturbations {uy} to obtain information about 9. 

To give an example, suppose we measure changes in frequency over time, so 
that d^ = sdt. We can interpret s as the environmental pressure for change per 
unit time, and dt as the time change over which measurements occur. The Fisher 
information that we obtain from frequency changes cannot separate between s and 
dt; instead we only get information about the total dimensionless change, d^. 

We can think of d^ as the total pressure for change that the environment 
applies to the initial population, that is, a measure of the mismatch between the 
current population and the environment observed over the scale d^^. Our measure 
of Fisher information based on observations about frequency fluctuations provides 
information about this mismatch. 

To summarize, if we start with observations of frequency fluctuations, then 
we can calculate information about the mismatch with the environment over the 
dimensionless scale d^^. By contrast, if we start with initial frequencies and rules 
about change per some dimensional unit, we can calculate dynamics as altered 
frequencies per dimensional unit. Thus, dynamics and information about environ- 
mental mismatch describe the same system, but represent different points of view 
with regard to what we observe and what we calculate from the given quantities. 
Before we analyze these issues of interpretation more fully, it helps to consider 
from an informational perspective what else we can learn about natural selection. 

Fisher information from correlated observations 

We often have measurements about the change in characters associated with fit- 
ness, such as weight, resistance to parasites, and so on. In this section, I derive 
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the Fisher information contained in observations that are correlated with fitness, 
where fitness is equivalent to frequency fluctuations. In the next section, I use the 
result for correlated characters to place measures of Fisher information into the 
broader context of evolutionary dynamics given by the Price equation. 

I begin with alternative forms of the total Fisher information in the population 
with respect to 9. Start with the forms given by Eq. (I5j) 

p=Y.4fi-Y.n< (6) 

y \ y/ y 

Note that the form of each sum is 

5^p,5j = Cov(5,5), 
y 

because 5 = in each expression in Eq. ([6]). Thus, we can, for example, write the 
total Fisher information from frequency fluctuations as 

P = Cov(a, a). 

In general, we can write an equivalent expression for Fisher information based on 
any measurement z correlated with a as 

R^aP = RzaCov{a, a) = Cov(a, z), 

where Rza is the regression of z on a. Note also that 

Cov{a,z) = Cov{m,z), 

because ay = ruy — m. Thus, Fisher information can be expressed in terms of the 
covariance and the regression between fitness and an arbitrary character, z. 

It is also useful to note that the Fisher information that arises from natural 
selection of phenotj^es, as expressed in the prior equation, can be expressed as 
a rate of change in the average value of a population. First, note from earlier 
definitions of fitness that Wy = 1 + rriydd and w = 1 + md6. Thus, ignoring terms 
of order d9, we have 

ay = rriy — m= {wy/w — 1)/ d9, 
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allowing us to expand the covariance between a character and fitness as 

RzaP = Cov(a, z) = Cov(m, z) 

y 

= {^Py^y-J^Py^y)/"^^ 

y y 

~ "y^jPy^y 

y 

= ^P, (7) 

in which the subscript P denotes the focus on phenotypes and total population 
information. Here, I do not account for any changes to the character during 
transmission or caused by changes in the environment. I develop a full description 
of those additional processes of evolutionary change in the next section. 

The Price equation of evolutionary dynamics in 
terms of Fisher information 

Thus far, I have focused on the part of evolutionary change that follows from 
variation in fitness between types. In particular, Eq. ([6]) shows the equivalence 
between the variation in fitness, "^PyCLy, and frequency fluctuations given by the 
expression in terms of Py. 

In this section, I derive the Price equation, a general expression for total evo- 
lutionary change that accounts in an abstract way for all evolutionary forces, and 
for any arbitrary character (Price, 1970, 1972b; Frank, 1995, 1997). I then use 
the Price equation to place natural selection and Fisher information into the wider 
context of evolutionary change. The following sections use the Price equation to 
relate the change in the frequencies of the hereditary particles to the amount of 
Fisher information transmitted over the scale d6. That change in Fisher infor- 
mation through the hereditary particles leads us to the fundamental theorem of 
natural selection. 

In what follows, let unprimed variables represent measurements at 9 and primed 



13 



variables represent measurements at 6' = 6 + dO. Overdots denote change with 
respect to 9 over the scale d9 = 9' — 9. As before, a primed frequency, p'y = PyWy/w, 
represents the frequency of entities at 9' derived from type y at 9. Similarly, for 
a measurement on types, Zy, the value z'y represents the average value of z among 
those entities at 9' derived from type y at 9. These special definitions for the 
mappings between two populations, first emphasized by Price (1995), give rise to 
a more general form of the Price equation first presented by Frank (1995, 1997). 

With these definitions, the total change in the average value of some measure- 
ment, z, relative to the change between 9' and 9, is 

= ( '^(Py^yMi^y + d%) - '^Py^y) / 

= "^Py^^y^y + ^Py^y + ^PyiWy/w - l)dZy/d9 

= Cov{m, z) + E(i) + Cov(m, dz) 

= CoY{m,z) + E{z), (8) 

where, in this continuous analysis, we can ignore Cov(m, d^;) as of magnitude 
dz — ^ 0. Eq. ([8]) is a form of the Price equation. We can express the second term 
alternately as 

H^) = ^PyZy=^E\P, (9) 

where ze\p is the rate of change in the character value that has nothing to do 
with changes in frequency, and thus has nothing do to with information about the 
environment transferred to the population via natural selection. The subscript E\P 
emphasizes that this environmental component arises in the context of analyzing 
phenotypes and total population information, P. 
Putting the pieces together 

z = Cov(m, z) + E(i;) 
= Py^y + Py^y 

= RzaP + 'Ze\P 

= Zp + ZE\P- (10) 
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Note that the total change can be expressed in terms of pure Fisher information 
associated with the direct effects of frequency fluctuations, ^p, and the environ- 
mental changes holding constant the consequences of frequency fluctuations caused 
by natural selection, 'zi^\p. 

Recall that Fisher information is a measure of distance between two probability 
distributions. We see here that the distance metric in Fisher information can be 
translated into the rate of evolutionary change in mean character values caused by 
natural selection. 

Information in the hereditary particles 

The previous analysis did not explicitly consider the process by which characters 
transmit between populations. Any changes in transmission are hidden within the 
environmental component of change, ze\p- In biology, characters do not trans- 
mit as units, but rather as particles such as genes and other transmissible units 
that contain information about the character. Full analysis of evolutionary dy- 
namics must consider the expression of information and change in terms of the 
transmissible, hereditary particles. 

In the previous section, I developed the equations for change in terms of an 
arbitrary character, z. In this section, I focus on the character fitness, m, in order 
to connect the results to Fisher's famous fundamental theorem of natural selection, 
which is about the rate of change in fitness caused by natural selection (Fisher, 
1958; Price, 1972a; Ewens, 1989, 1992; Frank & Slatkin, 1992; Edwards, 2002). I 
also introduce the hereditary particles into the analysis. My analysis, focused on 
fitness, m, could be expanded to any character, z. correlated with fitness. 

Let y index subsets of the population, each subset containing a common set of 
hereditary particles. Let the set defining y be {xy.;}, where j = 1, . . . , M labels 
the different kinds of hereditary particles that exist in the population. Each type, 
y, contains Xyj copies of particle j, and a total of Xyj — k particles. 

I now make a few additional assumptions that, while not necessary, lead to a 
convenient notation and to connections with classical population genetics. Assume 
that each type has L distinct slots (or loci), each slot containing n particles, so that 
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k = nL. Further, assume that a particular particle can occur only at a particular 
slot. Then the frequency of a particular particle is 

rj — y ^^PyXyj/n. 
y 

We measure the prediction about the character of interest provided by each 
particle by multiple regression (Fisher, 1958). In this case, our focal character is 
fitness, m; as before, we use the average excess in fitness ay — rUy — m, and write 

j 

where aj is the partial regression coefficient of fitness on the predictor (particle) 
type j. In genetics, aj is called the average effect of the predictor (or allele) j 
(Fisher, 1958; Crow & Kimura, 1970). If we sum both sides over the frequency of 
types, Py, we have, by convention, YliPy^y — 0) from the definition of r, we 
obtain ^^r^aj = 0. 

We can define Qy — OijXyj as the fitness deviation predicted by the hereditary 
particles; the term gy is often referred to as the breeding value. Thus, we can write 

ay = gy + ey, (11) 

which says that the observed fitness deviations and frequency fiuctuations of types, 
ay = Py/Py, are equal to the deviations predicted by the hereditary particles carried 
by each type, gy, and a deviation between the observed and predicted values, 
Ey. Because gy predicts the frequency fluctuations of types, we can write those 
predicted frequency fluctuations as 'jy — (7^ —py)/d9 — gyPy 

How should the predictive value be chosen for the average effect of each hered- 
itary particle, that is, how should values be assigned for the ajl Following Fisher 
(1958), I choose the aj to minimize the euclidean distance between observation 
and prediction, which means that the aj are partial regression coefficients ob- 
tained by the theory of least squares. Later, I will discuss the interpretation of 
why one would choose the value of the average effects of the hereditary particles 
with respect to this minimization criterion. For now, I analyze the consequences, 
which relate to Fisher information and Fisher's fundamental theorem of natural 
selection. 
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The total squared distance between observation and prediction is the sum of the 
squared deviations for each type, each squared deviation weighted by the frequency 
of the particular type 

y 

A geometric interpretation provides some insight into the consequences of mini- 
mizing this distance, and how these measures relate to Fisher information. 

A geometric interpretation of heredity and Fisher 
information 

Suppose there are D different types, that is, y = 1, . . . , D. Then the observed fit- 
ness deviations can be described as a point in D-dimensional space, a = {y/jhiciy}, 
with a squared distance from the origin of 

P = ^Py(^l = ^PyiPy/Pyf = ^-^^ (13) 

where, from Eq. we obtain the equivalence to the total Fisher information, 
P, and the sum of squared frequency deviations. The dot product notation a ■ a 
expresses the sum of the product of each dimension between two vectors. 

Similarly, we can describe the predicted fitness deviations as a point g = 
{y/p^gy}, with a squared distance from the origin of 

G = J^PySl = ^Py^iylPyf = S " §' (14) 

where, as defined above, 7^ is the predicted frequency fluctuation of types. Here, 
G is a measure of Fisher information, because it matches the definition of Fisher 
information as a normalized distance between an initial probability distribution 
and an altered distribution, as in Eq. 

Returning to Eq. (1121) . we can express the minimization of the distance between 
observed frequency fluctuations, a, and predicted fluctuations, g, as a minimization 
of the distance between the points a and g. Because ey = ay — gy, we can express 
geometrically the deviations between observed and predicted fluctuations as e = 
{\/Pyi^y ~ gy)} = — g. Next, we can write in geometric notation the total 
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squared distance between observation and prediction in Eq. (1121) . using the fact 
that a = g + e, as 



The location of a is set by observation. So, to minimize the squared distance 
between observation and prediction, we must choose the predicted values g to 
be as close to a as possible. The standard theory of least squares, based on the 
principles of linear algebra, provides all of the details, which I will not derive here. 
Rather, I will give the results, and briefly describe some intuitive ways in which 
those results can be understood. The key is that the shortest distance between a 
point and a line is obtained by the perpendicular drawn from the point to the line. 

Choosing g to be as close as possible to a means that the vector from the origin 
to g will be perpendicular to the vector between g and a, denoted e. Consequently 
g • e = 0. The requirement that g be perpendicular to e can be understood as 
follows. Since a is fixed in location, the distance between a and g depends on the 
length of g and the angle between a and g drawn as vectors from the origin. 

Recall the definition of g given by Qy = J2j ^j^yj- The x values are fixed, but 
we are free to choose the aj in order to make g as close as possible to a, subject 
to the one constraint that 'Yl,yPy9y = Tlj ^j'^j ~ 0- This constraint is still satisfied 
if we multiply all aj values by a constant. Thus, we can freely choose the length 
of the vector g, assuming that we have at least one degree of freedom in setting g 
after accounting for the single constraint on the aj. If we can set the length of g, 
then no matter what constraints there are on reducing the angle between a and 
g, the shortest distance between a and g, given by the vector e = a — g, occurs 
when the vector e is perpendicular to g. 

Correlations in the x values can lower the number of degrees of freedom we 
have for reducing the angle between observation, a, and prediction, g, causing an 
increase in the minimum distance between observation and prediction. However, 
such correlations between the x values do not prevent adjusting the length of g 



e . e = (a - g) ■ (a - g) 

= a - a - 2(a - g) + g - g 
= a-a-2(g + e) -g + g-g 




(15) 
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as long as there is one degree of freedom available. Thus, correlated predictors 
(alleles) do not alter the conclusion that g ■ e = 0. In terms of classical population 
genetics, nonrandom mating and linkage between different alleles may cause corre- 
lations between the alleles. The point here is that such processes do not alter the 
conclusion that minimizing the distance between observation, a, and prediction, 
g, leads to g ■ e = 0. Using this fact in Eq. (fTSll . 

ee = aa — gg. 

Recall from Eq. f|T3|l that P = a ■ a, the intrinsic or total Fisher information about 
the environment measured by the population in regard to the observed fluctuations 
of phenotypic fitness. Similarly, from Eq. f[T^ . G = g ■ g, the Fisher information 
about fitness captured by fluctuations in the hereditary particles, when the average 
effect of each particle is obtained by minimizing the distance between the observed 
and predicted fitnesses. If we define E = e ■ e as the difference between P and G, 
then we can express the distance relations of Eq. (llSp in terms of Fisher information 
as 

P-G = E, 

where G is the portion of the total Fisher information, P, captured by the heredi- 
tary particles, and E is the portion of the Fisher information lost by the population. 
In terms of traditional genetics, P is the total variance in the population, G is the 
genetic variance, and E is the environmental variance, with G/P a, measure of 
heritability (Crow & Kimura, 1970). 

The new result here is that these traditional measures of variance are measures 
of Fisher information. I suggest that the role of variances in the fundamental 
equations of biology can be understood as measures of Fisher information. 

The total change in fitness 

I prepare my exposition of the fundamental theorem of natural selection by first 
expressing the components of the total change in fitness. I use Eq. ffTOj) as a 
starting point, then transform the components into terms that can be ascribed 
to the transmissible particles. For example, I measure the direct effect of natural 
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selection as the change in the character that arises from the changed frequency 
of the transmissible particles caused by natural selection. In this case, I analyze 
fitness itself as the character of interest. 

We can express the fitness of the yth type in terms of the predicted fitness 
deviation, gy, and the residual, e^, by combining Eq. (jl]) and Eq. ( ITTi) . yielding 

niy = m + gy + ty. 

The predicted value of the fitness for the yth type is obtained by dropping the 
residual term and using only the predicted deviations, gy, yielding the predicted 
fitness 

Vy = m + gy. 

With these definitions, we can analyze the total change in fitness over the scale d^ 
by focusing on the predicted fitnesses given by f , because 

m = {m! — rn)/d9 = (^''^p'yVy — PyVy) ^d9 = v, 

the equality derived by the fact that J^Pyd'y = J2Py9y = 0- From Eq. (JTOl) . using 
V in place of z, we can write 

Next, expand the first term on the right, using the fact from Eq. ([7]) that, for 
z = g, we can write J2Py9y — Cov(m, (7), thus 

J^Py^'y = J^PySy 

= Cov(m, g) 
= Cov{g + e, g) 
= Cov{g,g) 
= G 
= mo, 

where Wg arises by analogy with Eq. ([7]); here the subscript G denotes that this 
term quantifies the rate of change in fitness explained directly by the predicted 
fitness deviations, g, based on the hereditary particles. 
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The second term, from Eq. ([9]), is 



^PyVy = mE\G, 



where the subscript E\G denotes environmental changes that alter the transmission 
of character value independently of changes in frequencies caused by selection. 
Here, the context is G, because the value measured by v arises as a prediction 
based on the hereditary particles, rather than the actual value. 

Putting the pieces together, we can express the rate of total change in fitness 

as 



The fundamental theorem of natural selection 

Fisher (1958) partitioned the total change in fitness into two components. First, 
he ascribed to natural selection the part of total change caused by differential 
success of types and consequent change in the frequency of the hereditary particles 
(Price, 1972b; Ewens, 1989). Fisher calculated this natural selection component 
by ignoring any change in the average effects of the hereditary particles, because he 
assumed that such changes were not caused directly by natural selection. Second, 
following Fisher, I have called any component that depends on changes in the 
average effects of the hereditary particles a component of environmental change 
(Frank & Slatkin, 1992). 

Holding the environment constant and thereby ignoring any changes in the 
average effects of the hereditary particles, fnE\G = 0, we have, from the previous 
section. Fisher's fundamental theorem of natural selection as the fisherian partial 
change in fitness ascribed to natural selection 



where the subscript / denotes the fisherian partial change. Fisher expressed G 
as the genetic variance in fitness, which we can see from the definition of G in 
Eq. f|T^ . because the average value of g is zero. 



m = mc + mE\G- 



(16) 



171/ = me = G, 



(17) 
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I showed that G is also the Fisher information about fitness captured by the 
hereditary particles. Thus, the fundamental theorem can he expressed as: the rate 
of change in fitness caused by natural selection is equal to the Fisher information 
about the environment captured by the hereditary particles. 

The fundamental theorem and frequency fluctua- 
tions of hereditary particles 

The fundamental theorem expresses that part of total change in fitness caused by 
changes in the frequencies of the hereditary particles. However, the statement of 
the theorem in Eq. ( ITTl) does not show explicitly how G is related to the changes 
in the hereditary particles. In this section, I make explicit the relation between 
the Fisher information captured by the hereditary particles and the changes in the 
frequencies of the hereditary particles. 

Begin, from Eq. (fT^ . by noting that the Fisher information captured by the 
population is written as 

G = ^Pygy = g-g, 

where g = {^Jp^Qy} is a point in D-dimensional space over y = 1, . . . , D. Rewrite 
this expression for G as 

^Pydl = CoY{g,g) = Cov(a,^) = ^Pyttygy = ^Pygy, 
because a = g + e, and Cov(e, g) = 0. Now, 

Py9y = Py ^yj^j 
y y j 

~ ( 'y^.Py^yj^^j- 

j y 

Next, expand the inner summation 

^Py^yj = {^P'y^yj ' J^Pv^v^) / 
y y y 

= n{r'j — rj)/d6 
= nf-j, 
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where I assume that any change in Xyj in the descendant is ascribed to a change in 
the environment, because changes in the state of hereditary particles are not direct 
consequences of natural selection. Put another way, x'yj = Xyj, where Xyj denotes 
the hereditary particle derived from Xyj. If the actual state of the hereditary 
particle at 6' differs from Xyj, then we account for this through the change in the 
average effect (see Eq. (fT9l) below). 
Putting the pieces together 

G = n'^rjaj = g- g, (18) 

j 

which means that we can alternatively express the location of the predicted fitness 
fluctuations as g = ^^frvf^j) in the M-dimensional space over j = 1, . . . , M. If 
the particles, x, are correlated within individuals, then the Vj may be correlated 
such that g is confined to a subspace of lower dimension than M. Such correlation 
may arise in biology from nonrandom mating or linkage. Correlation would not 
affect any conclusion, but may force the predicted fitness fluctuations, g, to be 
farther from the observed fitness fluctuations, a. 

The previous section showed that, by the fundamental theorem, the rate of 
change in fitness captured by the hereditary particles is the Fisher information 
captured by the hereditary particles, G. We see in Eq. (fT8|) a more direct expression 
of G in terms of the fluctuations in the frequencies of the hereditary particles. 

An alternative form for the average effects sets the partial change of the funda- 
mental theorem in the context of total evolutionary change. Express the average 
effect of a hereditary particle as hj = + aj), and the effect in the changed 
population as b'j = fn'{^ + a'j); thus, m = nb. We can, by analogy with Eq. ffTOj) . 
express the total change in fitness as 

= G + nJ2r,b,. (19) 

The first part, G, is the partial change ascribed to natural selection — this is the 
part that comprises the fundamental theorem. The second part provides an explicit 
description for the remaining part of total change. 
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If we assume that the hereditary particles have constant effects on fitness, we 
obtain bj = 0. By contrast, constant effects lead to aj = —m/n, because the 
constraint that ^ rjOj = forces the average effects to adjust for changes in the 
frequencies of the particles and consequent change in mean fitness. Thus, to study 
the total change in fitness, bj is a more natural metric. For example, changes in the 
bj may reflect true changes in effects in response to changes in particle frequencies, 
rather than adjustments for changes in the mean. 

The fisherian partial change in fitness arises through the frequency changes 
of particles when holding constant the average effects. Thus, the bj describe all 
aspects of change other than the fisherian partial change in fitness. When de- 
ciding how to choose the set of hereditary particles, a natural approach would 
be to increase G and reduce the second term on the right of Eq. f|T9l) : in other 
words, one might choose the most stable set of hereditary particles that explain the 
largest fraction of the total change through G. This criterion provides guidance 
for whether to consider nongenetic factors as hereditary particles. 

Scope of the fundamental theorem 

In traditional population genetics, one usually specifies at the outset particular 
assumptions about how individuals mate, how genes mix, and how genes are linked 
to each other. The scope of derivations from such assumptions remains limited 
to the particular systems of mating and mixing specified. By contrast, I did not 
make any specific assumptions about mating and mixing. Rather, my derivation 
transcends traditional genetics and applies to any dynamical system with a clear 
notion of success based on selection. My hereditary particles are just any predictors 
that can be identified and associated with success in selection — that is, with fitness. 

I divided the original population into subsets indexed by y. We may consider 
the index y to be different individuals, different genotypes, or any other partition — 
it does not matter. I defined the frequency of type y in the original population 
as Py. The expression p'y = {wy/w)py defines fitness, where p'y is the fraction of 
the descendant population derived from members of the class given by index y 
in the original population. Thus, fitness is simply a descriptive mapping between 
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two populations. If two or more sexes mix their hereditary particles to make 
each descendant, then we assign to each parental contributor a fraction of the 
descendant. 

We may, for example, consider subsets of the set y = 1, . . . , D to he different 
sexes or different classes of individuals, each sex or class with potentially a different 
net contribution to the subsequent population. In population genetics, we call the 
net contribution of each class the class reproductive value. However, all of those 
details automatically enter into the values of fitness, Wy, because I have defined 
Wy to be a mapping that measures fully the contribution of y to the descendant 
population. Complex mappings between populations may require a finely divided 
classification by y. But the system works for any pair of populations for which one 
can draw a map that associates components of each descendant to an ancestral 
entity. 

Another commonly discussed issue concerns correlation, or linkage, between 
hereditary particles. My geometric expressions for the population distribution of 
fitnesses and the predicted fitnesses showed that correlation between hereditary 
particles may constrain the location of the vector that characterizes the predicted 
fitnesses — or, we may say equivalently, the predicted frequency fluctuations. But 
such constraints do not change the fundamental properties by which distance is 
minimized in setting the predictor vector and determining the orthogonal (uncor- 
related) directions of the prediction and mismatch vectors, g and e. 

In short, my derivation applies to any selective system for which a proper 
mapping between populations can be defined to express the fitness relations, Py = 
{wy/w)py. If one can express the fractions of the descendant population that 
derive from the ancestral population, then the fundamental theorem follows. This 
expression, in its abstract form, is a general description of selection for any system. 

What about the hereditary particles? All that we need is to express a prediction 
about fitness, Wy, in terms of some predictors associated with y. The theorem 
works with any set of predictors. But, with such abstract generality, there is no 
single realization of the fundamental theorem, because there is no single set of 
predictors or hereditary particles that exist. One can use alleles and follow allele 
frequency changes, as Fisher did. But one could also include inherited pathogens. 
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maternal effects, cultural predictors however defined, or the interaction effects 
predicted by combinations of predictors. 

The natural set of hereditary particles should probably be those predictors 
that remain most stable during transmission; otherwise, the changes caused by 
selection disappear in the descendant population, because large changes in average 
effects cause much of total change to be ascribed to the environment according to 
Eq. (fT9|) . One might define an optimality criterion to delimit the set of predictors 
and hereditary particles with regard to stability of the particles and the amount 
of Fisher information captured by the particles. 

From a different perspective, one may analyze how the effectiveness of natural 
selection rises with an increase in the stability of the hereditary particles. It may 
be that natural selection itself favors an increase in the stability of the hereditary 
components, thereby separating the rate of change between selective and environ- 
mental components of evolution. Such time scale separation forms the basis for 
the subject of niche construction (Odling-Smee et al., 2003; Krakauer et al., 2008). 

Fisher information, measurement, and dynamics 

Fisher information fits elegantly into a framework of natural selection and evolu- 
tionary dynamics. But is the fit of Fisher information with evolutionary dynamics 
truly meaningful, or is the fit simply an outcome of altered notation? 

I answer in two ways. First, Fisher information does express evolutionary 
dynamics by a simple alteration of notation. This must be so, because I showed 
that Fisher information provides a measure of change, and change arises from 
dynamics. 

Second, although Fisher information describes dynamics, it also represents a 
different perspective. Typically, in the study of dynamics, one begins with initial 
frequencies over states, and rules for change. From the initial conditions and rules 
for change, one derives the changed frequencies. By contrast. Fisher information 
takes the initial and changed populations as given, and asks how much informa- 
tion can we obtain from the observed changes about the selective processes that 
determine those changes. 
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Put another way, we can think of dynamics as depending on three variables: 
initial state, changed state, and rules that determine change. Given any two, we 
gain information about the third. 

Natural selection is a rule that governs change. We generally cannot observe 
or directly measure natural selection. We can only infer natural selection from 
observed frequency fluctuations. For this reason. Fisher information is a very clear 
and important perspective on natural selection: Fisher information measures how 
much we can learn about the unobserved selective processes of nature from the 
observed frequency fluctuations. 

By contrast, traditional dynamical theory starts with an initial state and a 
hypothesized rule for change imposed by selection, and then makes a prediction 
about the changed state that can be compared with observation. 

There is no a priori reason to conclude that the traditional dynamical theory 
is better or worse than the Fisher information perspective. Each emphasizes a 
different view of dynamics. I prefer the Fisher information perspective, because 
we can say exactly how much Fisher information about natural selection we can 
obtain from observed frequency fluctuations and a particular set of hereditary 
particles. That seems hke a statement of greatest generality that can be used to 
understand natural selection. 

By contrast, from the traditional dynamical view, all we can say is that some 
part of the total change in fitness is caused by natural selection and there is some 
remainder term. That may be useful in some situations, but it does not seem to be 
very general or powerful with regard to studying natural selection. The limitation 
of the traditional dynamical view probably explains why Fisher's fundamental 
theorem has been almost universally misinterpreted. 

Shannon information compared with Fisher infor- 
mation 

Fisher never related his fundamental theorem of natural selection to Fisher infor- 
mation. When he presented his theorem, he did draw an analogy between the 
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fundamental theorem and the second law of thermodynamics. Following Fisher, 
some have tried to relate natural selection to thermodynamics and measures of 
information, but never with much success. I think the problem arose because 
thermodynamics suggested entropy and measures of information from the Shan- 
non index family rather than Fisher information. I do not know of anyone who 
has clearly related Shannon information and entropy to evolutionary dynamics and 
Fisher's fundamental theorem. 

We have seen that Fisher information provides a natural metric for evolutionary 
dynamics; the question here concerns how Shannon information and entropy relate 
to Fisher information and evolutionary dynamics. In this section, I derive a simple 
relation between Shannon information and Fisher information. 

Entropy in physics is defined ets S — —^Py^og(py); the Shannon index of 
information has the same definition but is usually denoted by if. A vast literature 
debates whether S and H are conceptually equivalent or, alternatively, whether 
the corresponding forms of S and H are merely coincidental (Ben-Nairn 2008). 
The relations between information and thermodynamics do not concern me here; 
I focus on the relations between Shannon information and Fisher information in 
the context of understanding dynamics. 

The component of evolutionary dynamics ascribed to natural selection, G, 
arises by maximization of Fisher information. Fisher information measures the 
observed acceleration in the frequencies of a probabihty distribution with respect 
to some unobserved parameter (force); from the observed acceleration, one ob- 
tains information about the unobserved force. These relations can be seen in the 
definition of the Fisher information about a parameter 6, given earlier as 



where L — log\p{9\y)] is the log-hkelihood of 9 given an observed value of y. 

I now propose an interpretation by which Fisher information is equivalent to 
the acceleration of Shannon information. I show that, if we differentiate Shan- 
non information twice with respect to a parameter, we obtain Fisher information. 
However, this correspondence only arises if we assume a particular interpretation 
of what it means to measure the acceleration of Shannon information with respect 




(20) 
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to a parameter. 

To begin, let us consider Shannon information from the same perspective as 
Fisher information: how much do we learn about a parameter, 6*, from an obser- 
vation of the random variable, yl We can rearrange Shannon information as 

H = -Y,p{y\e)\og\p{e\y)] 

= -Y.pmmy)- 

To obtain the acceleration of Shannon information, we differentiate H twice with 
respect to To differentiate, we may consider two choices. 

First, we may think of both p{y\d) and L{d\y) as functions of 9, and apply the 
chain rule. This approach makes sense if we wish to study the total change in the 
information or entropy measure between two different distributions. We may, for 
example, wish to find a distribution that maximizes the total entropy. To find such 
a distribution, we would need to study the total entropy change between different 
distributions. 

Second, we may regard p{y\9) differently from L{9\y) with respect to the pa- 
rameter 9. In particular, ■~L{9\y) may be thought of as the information that we 
obtain about the variable 9 given an observation y. In the theory of Shannon infor- 
mation, —L{9\y) is referred to as "self-information": a measure of the information 
obtained from the observation of a particular value of a random variable. Tribus 
(1961) called —L{9\y) the "surprisal": a measure of the surprise in observing a 
particular outcome, y. 

By these interpretations of —L as the measure of information in an observation, 
we obtain the expected surprise, or Shannon information, H, by averaging —L 
over the distribution p{ii\9). Thus, if we want a measure of the expected second 
derivative, or acceleration, of the information in an observation, it makes sense to 
hold p{y\9) constant, and differentiate only the information, —L. 

If we differentiate — L twice with respect to 9, and then average over the dis- 
tribution p{y\9), we obtain the acceleration of Shannon information, iJ, as the 
expected acceleration of — L. In this case, the acceleration of Shannon information 
equals Fisher information, as in Eq. (I2U|) . 
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From Eq. ([21) and Eq. ([3]), the expected acceleration of the hkehhood is 

We may read this as: the acceleration of information with respect to 6 is the 
squared distance between probability distributions over the scale 6, taken from 
the perspective of the distribution at the point 9. Taking the perspective at 9 
means that we weight the distances at each value y by the probability p{y\9). 

This measure of acceleration provides a natural correspondence between dy- 
namics and information. From observed dynamics, Py, we learn information about 
an unobserved parameter, 9. Put another way, from observed acceleration, [py/py)"^, 
we learn about an unobserved force embedded in the scale 9. The correspondence 
between acceleration and force is the fundamental principle behind all dynamics. 

Frieden (2004) has shown that many aspects of dynamics in physics can be 
derived from a principle that in effect maximizes the Fisher information in the 
frequency fluctuations that characterize dynamics. With regard to Shannon in- 
formation, and the equivalent measure of entropy, Martyushev & Seleznev (2005) 
review many lines of evidence that dynamical trajectories follow paths that maxi- 
mize the gain of entropy or Shannon information. 

In both Frieden (2004) and Martyushev & Seleznev (2005), the fundamental 
correspondence between information (entropy) and dynamics arises by variational 
principles, in which dynamical paths and, equivalently, information measures, are 
extremized subject to external conditions imposed. I have shown that, with regard 
to force, acceleration, and dynamics. Fisher information and Shannon information 
are equivalent. 

So, in the end, we return to the question I posed in the introduction. Is the 
correspondence between evolutionary dynamics and information fundamental and 
useful to our way of thinking? I believe that dynamics and information are two 
alternative perspectives of the same phenomenon: the dynamical view begins with 
an observed or supposed force and deduces acceleration; the informational view 
begins with observed or supposed acceleration and induces force. 
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Maximization of Fisher information in biology and 
other sciences 



Frieden (2004) has argued that Fisher information is the natural metric for all of 
the sciences. Frieden's own work has concentrated primarily on physics. I know 
little of physics, and so I cannot comment extensively on Frieden's work and its 
relation to my discussion of Fisher information in evolutionary dynamics. However, 
I did get the idea of applying Fisher information to natural selection from reading 
Frieden. The most important point of connection between Frieden's physics and 
my analysis arises from the role played by the maximization of Fisher information. 

In Frieden's framework, the physical constraints that define the dynamics of 
a particular natural phenomenon contain an intrinsic amount of information, J. 
Observation of the dynamics, measured in terms of frequency fluctuations, trans- 
fers information about the phenomenon to the data, yielding a level of information 
in the data about the phenomenon, I. Observations may not completely capture 
all information in the phenomenon, thus / < J; we can write J — I — —K, where 
—K>Ois the information lost. If one quantifies the informational measures in 
terms of Fisher information, then Frieden shows that physical phenomena typically 
minimize —K. Minimization of —K means that measurement transfers the maxi- 
mum amount of Fisher information from J to /, that is, from the phenomenon to 
the data. 

In most physical problems, the bound information J is not observed directly. 
Instead, J acts as the unobserved source for the information / received in the 
data. The value of J must be inferred, such that the information in observed 
frequency fluctuations, /, plus minimization oi —K derives the correct description 
of dynamics. 

In the biological problem that I analyzed, the bound information about the 
environment, P = J, arises from the total frequency fluctuations observed in the 
population; the captured information, G = I, arises from weighting the frequency 
fluctuations of the hereditary particles so as to maximize the prediction of the to- 
tal population fluctuations; and the information lost is = —K. Maximizing the 
prediction inherent in G is equivalent to minimizing P — G — E, or, equivalently, to 
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maximizing the Fisher information G subject to the fixed value of P set by obser- 
vation. Thus, my analysis of natural selection has a similar structure as Frieden's 
method, but differs with regard to how one interprets the bound information and 
the information in the data. 

Leaving aside physics, how should we interpret the maximization of Fisher 
information in our understanding of natural selection? This question leads us 
back to the two alternative frames of reference with respect to dynamics, each 
frame with distinct and complementary lessons. 

The traditional view of dynamics is: an initial probability distribution over 
states plus rules of change lead to a prediction of an altered distribution over 
states. One analyzes the quality of the hypothesized rules of change by the distance 
between the predicted and actual distribution over observed states after dynamical 
change. This view is inherently deductive: one arrives at the rules of change 
by deducing them from extrinsic principles or hypotheses. One then tests the 
deductive predictions against the observed distribution after dynamical change. 

The Fisher information view of dynamics is: the distance between the initial 
and subsequent distributions over states provides information about the unob- 
served rules of change. One measures the quality of the information about the 
unobserved rules by Fisher information. This view is inherently inductive: one 
arrives at estimates of the rules of change by iterative accumulation of informa- 
tion measured at particular points, weighed against decay of information as the 
rules change with respect to the points of measurement. In biology, the hereditary 
particles, or predictors, are the stores of information, and represent inductively 
achieved hypotheses that are tested in each round of measurement. 

Fisher information seems the perfect framework for analyzing the role of nat- 
ural selection in evolutionary dynamics. Natural selection must accumulate infor- 
mation about the environment inductively, acquiring information by changes in 
the frequencies of the hereditary particles. The interesting problem concerns how 
much of the total information about the environment, P, transfers to the popula- 
tion through the information gain by the hereditary particles, G. I showed that 
Fisher's fundamental theorem of natural selection follows from the assumption 
that G contains the maximum amount of Fisher information about the environ- 
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ment that can be captured in the frequency fluctuations of the hereditary particles. 

Why should G be maximized? Because the population will dynamically move 
toward its maximum fltness at a rate predicted exactly by the maximization of G, 
given a set of hereditary particles that remain stable in their average effect and 
do not change in their predictions (or causes) of fitness (Robertson, 1966; Crow 
&; Nagylaki, 1976; Ewens, 1992). Thus, in the absence of any evolutionary force 
other than natural selection, we see the direct effect of natural selection and its 
action in maximizing the accumulation of Fisher information with respect to the 
hereditary particles. 

This line of thought does not in any way require that the hereditary particles 
actually remain stable. In any realistic situation, the effects of the hereditary 
particles will change, for example, if the effects depend on the frequencies of the 
particles. The argument does show how to isolate the direct effect of natural selec- 
tion. That direct effect always moves the population in the direction of increasing 
fitness at a rate that arises from the maximization of Fisher information captured 
by the hereditary particles. 
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